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Abstract. We prove that bootstrap type Monte Carlo particle filters approximate the optimal 
nonlinear filter in a time average sense uniformly with respect to the time horizon when the signal is 
ergodic and the particle system satisfies a tightness property. The latter is satisfied without further 
assumptions when the signal state space is compact, as well as in the noncompact setting when the 
signal is geometrically ergodic and the observations satisfy additional regularity assumptions. 



1. Introduction 
Consider a hidden Markov model of the form 

X n = f(X n -i,£n), Y n = h(X n ,r] n ), 

where (£ n )n>i> {Vn)n>o are independent i.i.d. sequences. The signal X n represents a dynamical 
process of interest, but only the noisy observations Y n are available. More generally, (X n ) n >o may be 
any Markov process and (Y n ) n >o are assumed to be conditionally independent given (X n ) n >Q. Such 
models appear in a wide variety of applications (see, e.g., [II]). As the signal is not directly observed, 
one is generally faced with the problem of estimating the signal on the basis of the observations. 
To this end, the nonlinear filtering problem aims to compute the conditional distribution ir n of the 
signal X n given the observation history Yq, . . . , Y n in a recursive (on-line) fashion. 

The theory of nonlinear filtering is a classic topic in probability [20] and statistics [2] . Unfortu- 
nately, the theory suffers in practice from the fact that the conditional distribution ir n is an infinite 
dimensional object. With the exception of some special cases, the filtering recursion can not be 
represented in a finite dimensional fashion and its direct implementation is therefore intractable. 
For this reason, realistic applications have long remained limited. 

This state of affairs was revolutionized in the early 1990s by the discovery [12] of a new class of ap- 
proximate nonlinear filtering algorithms based on Monte Carlo simulation, which are known under 
various names in the literature: bootstrap filters, interacting particle filters, sequential Monte Carlo 
filters, etc. Such algorithms are simple to implement (even for complex models), are computation- 
ally tractable, typically exhibit excellent performance, and can be rigorously proved to converge to 
the exact nonlinear filter when the number of samples is large. These techniques have consequently 
been applied in problems ranging from robotics to finance, and their theoretical properties have 
been investigated by many authors; we refer to the collection [11] for a general introduction to 
the theory and applications of Monte Carlo particle filters, while a detailed overview of theoretical 
developments can be found in the recent monographs [Qd]. 

Despite many advances in recent years, however, certain empirically observed properties of Monte 
Carlo particle filters remain poorly understood theoretically. The aim of this paper is to study one 
such property: the uniform nature of the particle filter approximation. 
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Figure 1. The conditional mean E(JT„|Yb, ■ ■ ■ , Y n ) (red) and approximations by the boot- 
strap (blue) and naive (green) particle filter for a single sample path of the model described 
in the text. The number of particles N used for the approximate filters varies in each plot. 

1.1. A toy example. The uniform nature of particle filter approximations is most easily illustrated 
by means of a simple but illuminating numerical example. Let us consider the filtering model 

X n = 0.9 X n _i + £ re , X = 0, Y n = X n + r) n , 

where £ n , rj n are i.i.d. iV(0, 1). As only the observations are available to us, we aim to compute the 
conditional mean of the signal E(X n |Yo, • • • ,Y n ). In this very special case, it is well known that 
the latter can be computed exactly using a finite dimensional algorithm (the Kalman filter). 

A numerical simulation of this example is shown in figure [IJ where we have plotted the exact 
conditional mean and its approximation obtained by means of the bootstrap particle filter. For sake 
of illustration, we have plotted also a different 'naive' Monte Carlo approximation of the conditional 
mean which, like the bootstrap filter, is easily proved to converge to the exact conditional mean 
when the number of Monte Carlo particles is large. [The precise details of these algorithms will 
be given in section [3] below, and are irrelevant to the present discussion.] Though both algorithms 
converge, the difference in performance between the two algorithms is striking: the approximation 
error of the naive algorithm grows rapidly in time, while the error of the bootstrap algorithm 
appears to be independent of time (see [8] for further computations in this example) . 

Evidently the fact that both algorithms converge does not capture the key qualitative advantage 
of the bootstrap filter over the naive algorithm: the bootstrap filter converges to the exact filter 
uniformly in time, while the naive filter does not. Even if in practice the filter is only of interest 
on a finite time horizon, the rapid growth of the error of the naive filter is a severe problem as 
the filter becomes useless after relatively few time steps. In contrast, uniform convergence of the 
bootstrap filter indicates that its approximation error does not accumulate over time, which is 
essential for robust performance. It is therefore of considerable practical interest to establish under 
what conditions approximate filtering algorithms converge uniformly in time. 

The linear example considered here is very special in that the filter can be computed exactly. 
One would therefore never use a particle filter in this setting. We have chosen an example which 
admits an exact solution as this provides a benchmark with which we can compare the performance 
of particle filter approximations. On the other hand, exactly the same phenomenon as is illustrated 
in figured] is observed numerically in almost any ergodic filtering problem. A general understanding 
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of this phenomenon is therefore essential in order to guarantee reliable performance of approximate 
filtering algorithms in nonlinear filtering problems, which almost never admit an exact solution. 
The aim of this paper is to establish uniform convergence of approximate filtering algorithms, and 
in particular of particle filters, for a large class of ergodic filtering models. 

In the following discussion we denote by 7r n the conditional distribution of X n given the obser- 
vation history, and by tv^ its particle filter approximation with N particles. Both are computed 
recursively, which we denote as ir n = _F(Y n , 7r n _i) := F n n n -i and tt^ = F N (Y n , vr^_ 1 ) := F^ir^_ v 

1.2. Previous work. Much of what is known about uniform convergence of the particle filter has 
its origins in the work of Del Moral and Guionnet [7], who established a fundamental connection 
with filter stability. The basic idea of this approach is as follows. The difference between the 
approximate and exact filter can be written as a telescoping sum (setting for simplicity ttq = ttq) 

n 
k=l 

Suppose the the filter is geometrically stable in the following sense: 

(1) \\F n ---F k+lt i-F n ---F k+1 u\\<Cp n - k \\fi-ul 

where || • || is a suitable norm on probability measures and C < oo, (3 < 1 are constants. Then 

Vr-~\\F k Kk-l-F k K k -i\\ < ~ C 

k=l 



K -7T n || <2^ C P \\ F k TTfc-l - F kK k -l\\ < 



where we have used the fact that one time step of the approximate filtering algorithm Fj? introduces 
an approximation error of order 0(iV _1 / 2 ) and that the sum over f3 n ~ k is uniformly bounded. Thus, 
evidently, the filter is uniformly convergent at a rate 0(N~ 1 / 2 ). 

In order to establish the geometric stability property (JTJ) of the filter, Del Moral and Guionnet 
impose the mixing assumption s p{A) < P(X k £ A\X k -\) < e~ 1 p{A) on the signal transition 
probabilities (for some constant e > and probability measure p) which was originally considered in 
the filter stability context by Atar and Zeitouni [1] . This is a very strong assumption, more stringent 
even than uniform ergodicity [211 theorem 16.0.2] of the signal process, and is very difficult to satisfy 
in practice particularly when the signal state space is not compact. Though various methods 
have been proposed to extend the class of models to which the mixing assumption is applicable, 
essentially all subsequent work on uniform convergence of the particle filter [181 033 IM1 HI [231 E2] 
has ultimately relied on a form of this strong assumption. Unfortunately, the necessary assumptions 
are not satisfied in many (if not most) models encountered in applications, so that the practical 
applicability of the results established to date remains rather limited. 

In a sense this conclusion is rather surprising, considering that significant progress has been 
made in recent years in the understanding of the filter stability problem (see [5] for an extensive 
review of this topic). For example, Kleptsyna and Veretennikov [15] have recently established 
geometric stability \\F n • • • F k+ ip — F n • • • F k+ xu\\ < C(p, v, Yjo l0 o[) P n ~ k for a particular class of 
non- uniformly ergodic filtering models (see also [H [10] for further variations of this approach), 
while it has been shown that qualitative stability \\F n ■ ■ ■ F k+ \p — F n ■ ■ ■ F k +iv\\ — > as n — > oo a.s. 
already holds under minimal ergodicity assumptions on the signal [29] or under no assumptions at 
all on the signal if the observations are informative [28]. The difficulty in applying such results to 
the uniform convergence problem is that the constants in ([T]) are independent of both the initial 
measures p, v and the observation path Yro )0 oh which is generally not the case when the signal is 
not uniformly ergodic. Despite the considerable progress on the filter stability problem, the results 
cited above provide little control over the dependence of the constant on the initial measures. This 
presents a significant hurdle in applying these results to the uniform convergence problem. 
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An entirely different approach for proving uniform convergence properties of particle filters was 
developed by Budhiraja and Kushner [3] by exploiting certain ergodic properties of nonlinear niters. 
Filter stability still plays an important role in establishing the ergodic theory, but only qualitative 
stability results are needed, in contrast with the quantitative control over the convergence rate 
and constants needed in the approach of Del Moral and Guionnet. Using the recent filter stability 
results established in [29] . the necessary ergodic properties can now be established under extremely 
mild ergodic assumptions on the signal process. In this paper we revisit the approach of Budhiraja 
and Kushner and provide a new set of assumptions for the uniform time average convergence of 
bootstrap- type particle filters in the following sense (|| • ||bl is the dual bounded-Lipschitz norm): 



It should be noted that the time average convergence is weaker than uniform convergence established 
by Del Moral and Guionnet; moreover, this approach does not supply a rate of convergence. On 
the other hand, we are able to demonstrate convergence for a class of non-uniformly ergodic signals 
which are presently still out of reach of the more quantitative theory. 

1.3. Organization of the paper. In section[2]we introduce the basic nonlinear filtering problem. 
We then develop a general framework for uniform time average approximation of the nonlinear 
filter. In section [3] we introduce the bootstrap Monte Carlo filtering algorithm and discuss its 
basic properties. We show that the theory of section [2] can be applied to the bootstrap filter, 
provided that a suitable tightness property can be established. In section 0] we develop two classes 
of sufficient conditions for the requisite tightness property to hold. Both presume that the signal 
is geometrically ergodic, but different regularity assumptions on the observations are required in 
the two cases to complete the proof. Finally, appendix |A] recalls some basic facts about weak 
convergence, while most proofs in the text are postponed to appendix [Bj 



The purpose of this section is to introduce the nonlinear filtering problem, and to establish a 
general framework for its approximation uniformly in time average (not necessarily by a particle 
filter). The approach of this section follows closely the ideas of Kushner and Huang [T7] and of 
Budhiraja and Kushner [3], but here we have significantly simplified the proofs, generalized the 
notion of convergence and eliminated some technical assumptions. Our treatment is mostly self- 
contained, but we have postponed the proofs to appendix IB! 

2.1. The hidden Markov model and nonlinear filter. Let (E,"B(E)) and (F,"B(F)) be Polish 
spaces endowed with their Borel a-fields, let P : E x 'B(E) -> [0, 1] and $ : E x 'B(F) -> [0, 1] be 
given transition probability kernels, and let fi : ^>{E) — > [0, 1] be a given probability measure. We 
will work with random variables (Xk, Yk)k>o, defined on an underlying probability space (f2, 3, P), 
such that (X n ) n >o is a Markov chain with initial measure Xq ~ [i and transition probability P, and 
such that (Y n ) n >Q are conditionally independent given (X n ) n >o with P(Y n G ^4|X n ) = &(X n ,A). 
Such a model can always be constructed in a canonical fashion, and is called a hidden Markov model 
with initial measure fi, transition kernel P and observation kernel <1>. 

We will make the following nondegeneracy assumption on the observation kernel. 

Assumption 1 (Nondegeneracy). There is a cr-finite measure ip : Ti{F) — > K. and a strictly positive 
measurable function T:i?xF->]0,oo[ such that 




2. A General Approximation Theorem 




for all x G E 
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We now define the probability kernels vr fc „ : F k x 'B(E) ^ [0, 1] and vr fc : F k+1 x 15(E) — > [0, 1] 
by the following recursion: for all yo, . . . , y k E F and A E 15 (E), we have 

f J A Y(x, yk) 7Tk-(yo k-iidx) 

nk-(yo...k-i,A) = P(x,A)ir k _ 1 (y _ k _ 1 ,dx), ^k(yo...k, A) - 



with the initial condition ttq-(A) = ^(A). Then it is well known that by the Bayes formula, 

P(X k E A\Y , y fc _i) = ^(^...ifc-i, A), P(X k E A\Y , ...,Y k ) = ir k (Y ... k ,A). 

For notational convenience we will simply write n k _ (A) = 7Tfc_(Yo...fc-ij A) and7Tfc(^4) = ir k (Yo„. k , A). 
The kernel ir k is called the nonlinear filter and ix k - is the one step predictor associated with the hid- 
den Markov model (X k ,Y k ) k >Q. Unfortunately, these infinite dimensional quantities are typically 
not explicitly computable. We aim to obtain a computationally tractable approximation. 

2.2. Markov and ergodic properties. In the following, we denote by 7(E) the space of probabil- 
ity measures on (E, 15(E)) endowed with the topology of weak convergence of probability measures 
and the associated Borel cr-field. We define on 7(E) the probability distances 



\v - z/||bl = sup 
/eLip(B) 



/ f dv - / fdv' , \\v - v'\\tv = sup / f dv - \ f dv' 
J J li/iu<i J J 



where we have defined Lip(E) = {/ : ||/||oo ^ 1> ||/||i — 1} an< i 11/11^ i s the Lipschitz constant of 
/. The dual bounded-Lipschitz distance || • ||bl metrizes the weak convergence topology on 7(E), 
while the total variation distance || • ||tv is strictly stronger. 

Let us recall that any probability kernel v : 0, x 15(E) — > [0, 1] can equivalently be viewed as a 
J'(£')-valued random variable on the measure space f2 (see, e.g., [141 lemma 1.40]). In particular, 
we may consider the filter (7r k ) k >o to be a T(i?)-valued stochastic process adapted to the filtration 
3^ = o~{Yq, . . . ,Y k }. It is well known that this process possesses the Markov property, see, e.g., 
[26], and the associated ergodic theory will play a key role in the following. 

Assumption 2 (Ergodicity). (X k ) k >Q is positive Harris recurrent and aperiodic, i.e., there is a 
(unique) P-invariant measure A E 7(E) such that \\uP k — A||tv ~~ * as k — ► oo for every v E 7(E). 

When assumption [1] holds, we may define the update map U : F x 7(E) — > 7(E) as 

U(y tt)(A) = ^ I A(x)T(x,y)ir(dx) 
f T(x,y)n(dx) 

The following result collects the various properties of the filter that will be used below. 

Proposition 2.1. Suppose that assumption^ holds. Then the E x 7(E) -valued stochastic process 
(X k ,TT k ) k >o is Markov with transition kernel fl : E x 7(E) x 15 (E x 7(E)) — > [0, 1], 



f(x',Tr')n(x,ir,dx / ,dir / ) = J f(x',[J(y,7TP))T(x',y)ip(dy)P(x,dx'), 
and initial measure M E 7(E x 7(E)), 

f(x, vr) M(dx, dn) = J f(x, \J(y, fj,)) T(x, y) ip(dy) fj,(dx). 

Moreover, if assumption\M holds, then V\ possesses a unique invariant measure A E 7(E x 7(E)). 

The proof is given in appendix IB. 1[ Let us remark that the Markov property is elementary, while 
uniqueness of the invariant measure hinges on recent progress on the filter stability problem [29J . 
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2.3. A general approximation theorem. As the filter irk can not be computed exactly in prac- 
tice, we aim to approximate it by a sequence of computationally tractable approximate niters 7rj? 
(N € N), such that irj? — > 7Tfe as iV — ► oo. The goal of this section is to investigate what assumptions 
should be imposed on the filter approximations so that they converge to the exact filter uniformly 
in time average. We will subsequently apply this result to the setting where ir^f is a bootstrap type 
Monte Carlo particle filter with TV particles. However, the results of this section are general and 
could be applied to other types of filter approximation also. 

We have seen in the previous section that (iVk)k>0 is a y(-E)-valued ^-adapted process, such 
that (^fe,7Tfc)fe>o is Markov. We will consider approximate filters of a similar type, but we allow 
them to be adapted to a slighly larger filtration. This is needed to account for the random sampling 
step in Monte Carlo particle filters, which introduces additional randomness into the algorithm. 

Assumption 3 (Approximation). For every TV S N, the process (vr^ r )fc>o satisfies the following. 



(1) fa 



k >k>0 



is a 5 ( J E)-valued 3% V S-adapted process, where 9 is independent of [Xf.,Y^ 



k)k>0- 



(2) (Xf.,n^)k>o is Markov with transition kernel fljv and initial measure Mjy. 



We obtain the following general approximation theorem. 

Theorem 2.2. Suppose that assumptions CH2I hold. Moreover, we make the following one step 
convergence and tightness assumptions on the approximating sequence. 

(1) For any bounded continuous F : E x 7(E) — > M. and xn — ► %, v as N — > oo, we have 



F(x , v') V\n(xn-, VN,dx' , dv') 



In addition, we have Mj\r M as N — 
(2) For any sequence T/v y oo as N — > oo, 



N- 



F(x',i/) n(x,i/, dx',dv'). 



oo. 



E 



the family of probability measures En (A) 
Then the sequence (ir^)k>o converges to (^k)k>o as A" 



1 



T N 
k=l 



N > 1 is tight. 



oo uniformly in time average: 



lim sup E 

N — >oo x>0 



1 T 

k=l 



^k 



TTfcllBL 



0. 



The proof of this theorem is given in appendix IB. 21 

Let us note that the uniform time average convergence guaranteed by the theorem allows us to 
answer related convergence questions as well. For example, we can prove that the time average 
mean square error of the estimates obtained from the approximate filter converges to the time 
average mean square error of the estimates obtained from the exact filter, uniformly in time. 



Corollary 2.3. Suppose that the assumptions of theorem \2.S\ are satisfied. Then 

T / r \ 2 i T 

V( f(x,\ - I fd^\ 

T 

for any bounded continuous function f 
The proof is given in appendix IB. 31 



lim sup E 

N—rOO 7 1 >Q 



T 2 T ' 

f E (/(**) - / / - \ E (/(**) - / / d A 
k=i ^ j ' k=i ^ * 



Remark 2.4. The one step convergence assumption. The first condition of theorem 12.21 ensures that 
the approximate filter converges to the exact filter on any finite time horizon (lemma [B.3p . This is 
certainly a minimal requirement for convergence, and is typically easily verified in practice. 
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Remark 2.5. The tightness assumption. The second condition of theorem 12.21 ensures, roughly 
speaking, that the approximate filter does not lose mass to infinity after a long time (at least on 
average with respect to time and the observations). This is certainly the case for the signal itself by 
assumption [21 and this property is inherited by the exact filter by virtue of lemma IA.21 Tightness 
of the approximate filter is not automatic, however, and needs to be imposed separately. Though 
this, too, is arguably a minimal assumption to ensure convergence of the approximate filters, the 
tightness property appears to be much more difficult to demonstrate in practice. Indeed, this is 
the main difficulty in applying theorem 12.21 to Monte Carlo particle filters. 

An exception is the case where the signal state space E is compact; we state this as a lemma, 
though the result is entirely obvious and requires no proof. 

Lemma 2.6. If E is compact, then the second condition of theorem \2.2\ is automatically satisfied. 

In the compact setting, however, the generality of the ergodic assumption [2] is slightly misleading. 
Indeed, note that the first condition of theorem 12.21 implies that the signal transition kernel P is 
Feller. Therefore, under the mild assumption that the support of the signal invariant measure A has 
nonempty interior, compactness of the state space implies that the signal is even uniformly ergodic 
theorem 16.2.5 and theorem 6.2.9]. Moreover, if we assume that x i— > T(x,y) is continuous 
for every y (as we will do in order to prove the first condition of theorem 12. 2j) , assumption Q] and 
compactness of E implies that T(-,y) is bounded away from zero for every y. In this setting, 
uniform convergence could be studied more directly using the techniques in [7j. 

When E is not compact, a sufficient condition for tightness is the following. 

Lemma 2.7. If the family {E7r^ : k,N > 1} is tight, the second condition of theorem \2. 2\ holds. 
We omit the proof, which is straightforward. 



3. The Bootstrap Particle Filter 

The practical problem in implementing the exact filter is that the conditional distribution ir^ is 
an infinite dimensional object. In applying the theory, one must therefore seek finite dimensional 
approximations. The idea behind particle filters is to approximate the nonlinear filter by atomic 
measures with a fixed number of particles N £~N, i.e., by measures in the space 

{N N ~| 

^ Wj&xj ■ xi, . . . ,x N G E, wi,...,w N >0, = 1 > C 9(E). 

i=\ i=l J 

Note that the filtering recursion does not naturally leave the set 9n(E) invariant; therefore, ap- 
proximation is unavoidable. The bootstrap particle filter introduces an additional sampling step in 
the filtering recursion to project the filter back into the set 9n(E). 

To be precise, define the sampling transition kernel Rat : 9(E) x H>(9(E)) — ► [0, 1] as 



F(v) R N (p,du) = J F^d 



p(dxi) ■ ■■ p(dx N ). 



Then Rn(p, •) is the law of a 9(E)-valued random variable g that is generated as follows: 

(1) Sample N i.i.d. random variables X 1 , . . . , X N from p. 

(2) Set e = ±{5 X i +... + S xN }. 

We now introduce the transition kernel for the bootstrap particle filter as 

f f(x',iv / )n N (x,iv,dx , ,dir')= [ f(x',\J(y,Tr'))R N (7vP,dTT')r(x , ,y) ( p(dy)P(x,dx') 
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Algorithm 1: Bootstrap Filtering Algorithm 
Sample i.i.d. x Q , i = 1, . . . , N from the initial distribution ji; 
Compute w = ?(x , Y Q )/ Ee=i T (4> Y Q ),i = l,..., N; 
Set vr^ = Eili^j; 
for &=_/,. . . ,n do 

Sample i.i.d. xi_ 15 e = 1, . . . , JV from the distribution tt^ 

Sample x l k from P(x l k _ v ■ ), i = 1, . . . , N; 

Compute w\ = T(x\,Y k )/ £f =1 T(x{, Y k ), i = 1, . . . , A; 

Set 7tf = Eili^i^; 

end 



and we define the initial measure for the bootstrap particle filter as 

f(x,Tr)M N (dx,dir)= / f(x,V(y,ir))R N (iJ,,dir)T(x,y)(p(dy)n(dx). 



Note, in particular, that by construction M/v and it, •) are supported on E x Tn(E) for any 

x, 7r, so that the bootstrap particle filter is indeed finite dimensional in nature. Moreover, the law 
of large numbers strongly suggests convergence to the exact filter as N — > oo at least on finite time 
intervals; we will make this precise below by verifying the first condition of theorem 12.21 

We have not yet introduced an explicit construction of the random variables (ir k ) k >Q on the 
probability space (f2,5F, P). However, as all our state spaces are Polish, it is a standard fact (e.g., 
along the lines of [TJ1 proposition 8.6]) that the joint process (X k , Yk, n k , TT k ) k >o can be obtained 
for any N > 1 by a canonical construction, provided the probability space (J), 3, P) carries a 
countable family of i.i.d. Unif(0, l)-random variables (Cfc)fe>o independent of (X k , Yk)k>o- The 
random variables (Cfc)fc>o provide the additional randomness introduced by the sampling steps in 
the bootstrap filtering algorithm, and the construction is such that itu is V 9-adapted with 
S = cr{(,k '■ k > 0}. As it will not be needed in what follows, the construction of (X/., Yk,TTk, 7r^)fc>o 
will be left implicit, but the details of the construction should be evident from the bootstrap filtering 
algorithm [T] (which is clearly very straightforward to implement in practice) . 

Remark 3.1. A conceptually simpler naive particle filter could be constructed as follows. By the 
Bayes formula, the exact filter at time k can be expressed as 

( A , _ B(lA(X k )T(X k ,y k ) ■ ■ - r(x , yo )) 

E(T(X fc ,y fc )---T(X ,yo)) 
Therefore, by the law of large numbers, we can approximate ir k as follows: 

ZlMX*)T(Xi,y k )---T(Xi l ,y ) 



7Tfc(yo, ...,yk,A) 



Eli^(X l k ,y k )---r(X^y ) 



where (Xq, . . . , XI), i = 1, . . . , N are i.i.d. samples from the law of (Xq, . . . , X k ). Indeed, by the 
law of large numbers, this approximation is immediately seen to converge to the exact filter as 
N — > oo. However, as can be seen in the numerical example in figure [H the convergence is not 
uniform in time, and in fact the performance is quite poor (see [8] for a theoretical perspective). 

Our aim is to prove that the bootstrap particle filter converges uniformly in time average. We 
will do this by verifying the conditions of theorem [221 Clearly assumption [3] holds by construction, 
while assumptions [T] and [2] on the filtering model will be presumed from the outset. We now show 
that the first condition of theorem 12.21 holds under a mild continuity assumption on the filtering 
model. Tightness is a much more difficult problem, and will be tackled in the next section. 
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Assumption 4 (Continuity). The following hold: 

(1) P is Feller, i.e., x \— > P(x, ■ ) is continuous; 

(2) For every y S F, the map x i— > T(x,y) is continuous and bounded. 

Proposition 3.2. Suppose that assumptions [l\and\4\ hold. Then the first condition of theorem \2. 
holds true for the bootstrap particle filter. In particular, E(||7r^ — vt/cHbl) ► for any k < oo. 



The proof of this result is given in appendix IB. 41 From theorem I2.2( we immediately obtain: 
Corollary 3.3. Suppose that assumptions [71 O and\4\hold, and that 



the family of probability measures En (A) = E 
for any sequence Tjy /* oo as N — > oo. Then 



T N 



k=l 



N > 1 is tight 



lim sup E 

N — >oo x>0 

holds true for the bootstrap particle filter. 



1 T 
— \ " II 



vt/cIIbl 



fc=i 



4. Sufficient Conditions for Tightness 

By corollary 13.31 all that remains to prove in order to establish uniform time average consistency 
of the boostrap particle filter is the tightness of particle system generated by the algorithm — i.e., 
we must rule out the possibility that the particle system loses mass to infinity after running for a 
long time. It seems intuitively plausible that this can be proved under rather general conditions, 
as both the signal and filter are already ergodic (see assumption [2] and [29] ) and the sampling step 
in the bootstrap algorithm does not change the center of mass of the filter. 

Unfortunately, the tightness problem appears to be much more difficult than one might expect. 
A rather ominous counterexample in a different setting [25] shows that, contrary to intuition, 
arbitrarily small perturbations may cause a Markov chain to become transient (and hence lose 
its tightness property) even when the unperturbed chain is geometrically ergodic. Though the 
implications to the present setting are unclear, such examples suggest that the problem may be 
delicate and that tightness can not be taken for granted. In this section, we will provide two sets of 
general sufficient conditions under which tightness can be verified for the bootstrap particle filter. 
Both sets of conditions require geometric ergodicity of the signal (which is stronger than assumption 
[2]), and each imposes a different set of restrictions on the observation structure. 

Remark 4.1. Assumptions [1] and are very mild and are satisfied by the majority of ergodic 
filtering problems. In contrast, the sufficient conditions for tightness below are rather restrictive, 
and in this sense our results are not entirely satisfactory — establishing tightness under minimal 
ergodicity and observation assumptions remains an open problem. Nonetheless, the tightness prop- 
erty is purely qualitative and thus appears to be significantly more tractable than the quantitative 
controls required in other approaches to the uniform convergence problem (indeed, the general con- 
ditions imposed below are still out of reach of other approaches). Another interesting possibility 
is that tightness might be achieved by introducing suitable modifications to the bootstrap filtering 
algorithm, e.g., by means of a periodic resampling scheme or using some form of regularization. 

Let us briefly recall the relevant notion of geometric ergodicity. A function V : E — > [1, oo[ is 
said to possess compact level sets if the set {x £ E : V{x) < r} is compact for every r > 1. Given 
such a function V, we define the U-total variation distance between fx, v € 7(E) as 



sup 

\f\<V 



fd/J,- / fdu 



Vd\n 
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We will call the Markov chain (Xk)k>o geometrically ergodic if there is a function V : E — > [l,oo[ 
with compact level sets, a P-invariant measure A, and constants C < oo and f3 < 1 such that 

\\P k (x,-) - X\\ v < CV(x)p k for all x e E. 

Note that geometric ergodicity is strictly stronger than assumption [2j Geometric ergodicity is often 
easily verified in terms of Lyapunov-type conditions on the transition kernel and is satisfied in many 
practical applications; see the monograph [21] for an extensive development of this theory. 

4.1. Case I: bounded observations. We will first consider the following assumptions. 

Assumption 5 (Tightness: Case I). The following hold. 

(1) The signal is geometrically ergodic (\\P k (x, •) — A||y < CV(x) (3 k , V has compact level sets). 

(2) There exist strictly positive functions u_ : F — > ]0, oo[ such that 

f u + (y) 2 

u-(y) < T(x,y) < u + (y) for all x £ E, / — p{dy) <oo. 

Assumption [5] is typically satisfied when the observations are of the additive noise type with a 
bounded observation function. As an example, consider the observation model = h(X^) + 
on the observation state space F = M. d , where are i.i.d. iV(0, E)-random variables independent 
of (A/ c )/ c >o for some strictly positive covariance matrix S, and h : E — > M. d is a continuous and 
bounded observation function. Then we can set 

<p(dy) = ( 27r )rf/2 | S |i/2 exp d Vi T ( x ' v) = ex P (y*s -1 /t(x) - i /i(x)*e- 1 / 1 (x)^ , 

and assumptions [1] and U] are clearly satisfied for this observation model. Moreover, evidently 



u-(y) = exp 



1 

Ml + 2 



s 1 \\ WHoo , u+(y) = exp (\\y\ 



where ||/i||oo = sup xg£ ; satisfy the requirement in assumption 

4.2. Case II: strongly unbounded observations. To satisfy assumption the observation 
function h will generally need to be bounded. Our second set of assumptions is essentially the 
opposite scenario: we consider an observation model where h is strongly unbounded, i.e., converges 
to infinity in every direction (the requirement below that \\h\\ has compact level sets). 

Assumption 6 (Tightness: Case II). Let F = M. d , and suppose that Yjj = h(Xk) + a(Xj t ) where 
£fc are i.i.d. random variables independent of (X^^q. We assume the following: 

(1) The signal is geometrically ergodic (||P fc (x, •) — A|| y < C V(x) (3 k , V has compact level sets). 

(2) h : E — » M. d , a : E — > M. dxd are continuous, e||«|| < ||(j(a;)i;|| < e -1 !!^!! Vx,v for some e > 0. 

(3) The law of the observation noise has a strictly positive, bounded and continuous density 
(/£ : M d — ► ]0, oo[ with respect to the Lebesgue measure on M. d . 

(4) There is a nonincreasing q : [0, oo[ — > ]0, oo[, a norm | • | on M. d , and oi, 02 > such that 

ax q{\A) < Qd z ) ^ «2 for all z G M d . 

(5) There are constants 61, 63 > 0, 62, &4 G K> and p > with E(||^|| p ) < 00, such that 

6i||/i(x)f + 6 2 < ^(x) < 6 3 ||/i(x)ir + b 4 for all x E E. 

Remark 4.2. Note that when assumption [6] is satisfied, we may always choose ip to be the Lebesgue 
measure and T(x, y) = q^{o'(x)~ 1 {y — h(x)}), which is strictly positive and x 1— > T(x, y) is bounded 
and continuous for every y. We therefore automatically satisfy assumption [1] and the observation 
part of assumption HI Moreover, geometric ergodicity implies that assumption [2] holds also. Finally, 
note that as V is by definition presumed to have compact level sets, the assumption implies that 
x 1 — > \\h(x)\\ has compact level sets also, i.e., h(x) is strongly unbounded. 



UNIFORM TIME AVERAGE CONSISTENCY OF MONTE CARLO PARTICLE FILTERS 



11 



A typical example where assumption [6] is satisfied is the following. Let E = F = M. d , and consider 
the observation model = h{Xk) + where ~ A(0, X) for some strictly positive covariance 
matrix S, and h{x) = ho(x) + h\{x) where /io is bi-Lipschitz (i.e., it is Lipschitz, invertible, and its 
inverse is Lipschitz) and h\ is a bounded continuous function. Moreover, assume that the signal is 
geometrically ergodic where V satisfies the growth condition + b 2 < V(x) < b' 3 \\x\\ p + b' 4 for 

some p, b[, b' 3 > 0. Let us verify the requirements of assumption [6] in this setting. 

First, the law of has a density qg(z) = exp(— z*S _1 z/2)/(27r) d / 2 |S| 1 / 2 with respect to the 
Lebesgue measure. Therefore q^ is bounded, continuous, and strictly positive, and we may evidently 
set \z\ 2 = z*'E~ 1 z (which defines a norm), q(v) = exp(—v 2 /2) (which is nonincreasing), and a± = 
a>2 = (2-7r) _d / 2 |X|~ 1//2 . Moreover, it is easily established that 

h\\x\\ - \\ho(0)\\ - ||/ii||oo < \\h{x)\\ < l 2 \\x\\ + \\h (0)\\ + ll/ixlloo, 

where we have used that h\\x — z\\ < \\ho(x) — ho(z)\\ < l2\\x — z\\ for some h,l2 > by the 
bi-Lipschitz property of h^. We may therefore estimate 

AtWHxW - h ^f + b' 2 < v{x) < ^\\h{xw + + b'„ 

where we have written (a + b) p < C p {a p + bP) for a, b > (one can choose C p = max(l, 2 P ~ 1 )) and 
a = \\ho(0)\\ + ||/ii||oo- Finally, as any Gaussian has finite moments, E(||^fc|| p ) < oo. 

4.3. Uniform time average consistency. We have now introduced two sets of assumptions on 
the filtering model. Our main result states that either of these assumptions is sufficient for uniform 
time average consistency of the bootstrap particle filter. 

Theorem 4.3. Suppose that either assumptions U\ \4\ an d \^hold, or that the signal transition kernel 
P is Feller and that assumption^ holds. In addition, suppose that /i(V) < oo. Then the tightness 
assumption of corollary \3.3\ holds, and in particular 

T 



lim sup E 



i_ V" ii. 

T ^ 



T ^ i.TTfc - ^fcllBL 
k= 







holds true for the bootstrap particle filter. 
The proof is given in appendix IB. 51 

Appendix A. Some Basic Facts on Weak Convergence 

The purpose of this appendix is to recall some basic facts on weak convergence of probability 
measures and transition kernels that are particularly useful in the setting of this paper. 

A.l. Weak convergence of kernels. We begin by showing that weak convergence of transition 
probability kernels, in a sufficiently strong sense, can be iterated. 

Lemma A.l. Let : E x 'B(E) — > [0, 1], N G N be a sequence of transition kernels on a Polish 
space E, and let K be another such kernel. Then for every bounded continuous f : E — > R 

f(z)Kpf(xN,dz) JV ~ > °° ) J f (^ z ~j x (x , dz) whenever xn N ^°°> x 

if and only if for any j > 1, we have v^K 3 N uK 3 as N — > oo whenever => v . 

Proof. The if part follows trivially by choosing = S XN , v = 8 X , and j = 1. To prove the only 
if part, suppose we have established that the result holds for j < k. Then it clearly holds also for 
j < k + 1. By induction, it therefore suffices to consider the case j = 1. 

As vn =^ we can construct using the Skorokhod representation theorem a sequence of random 
variables X N — > X a.s. such that X N ~ v n , X ~ v. Let / be bounded and continuous, and note that 
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u N K N f = E(K N f(X N )) and vKf = E(Kf(X)). But by our assumption K N f(X N ) -» Kf(X) 
a.s., so the claim follows immediately using dominated convergence. □ 

A. 2. Tightness of random measures. As many of the stochastic processes in this paper are 
measure-valued, we require a simple condition for tightness of a family of measure-valued random 
variables. The following necessary and sufficient condition is quoted from |13| corollary 2.2]. As 
usual, if q is a T(E)-valned random variable, we denote by p = Eg G 7(E) the probability measure 
defined by p(A) = E(g(A)) for all A G 'B(E). Note that this is the bary center of Law(g) G 7 (7(E)). 

Lemma A. 2. Let {gi be a family of 7(E)-valued random variables on (f2,9~, P). Then this 

family is tight if and only if the family of probability measures {Egi : i G /} C 7(E) is tight. 

A. 3. Tightness in product spaces. The following elementary lemma will be used repeatedly. 

Lemma A. 3. Let {Hj : i G /} be a family of probability measures on E x E, where E, E are Polish. 
Then this family is tight iff its marginals x E) : i G /} and {Ei(E X •) : i G /} are tight. 

The proof is straightforward and follows along the lines of |27l lemma 1.4.3]. 



Appendix B. Proofs 
This appendix contains the proofs that were omitted from the main text. 

B.l. Proof of Proposition 12.11 Note that n k -\ is a function of Yq, . . . , Y k _\ only. Therefore 

E(f(X k , 7r k )\X , ...,X k ,Y ,..., Y k _ x ) = j f(X k , U(y, tt^P)) T(X k ,y) <p(dy), 

where we have used the hidden Markov property and ir k = D(Y k , n k _iP). Using the Markov 
property of (X k ) k > and the tower property of the conditional expectation, we obtain 

E(f(X k , 7T k )\X , X k _ lt Y Q , . . . , Y k _ x ) = J f(x', U(y, n^P)) T(x', y) <p(dy) P(X k _ l , dx'). 

As a{X , . . . , X k -i,TT , . . . ,7T fc _i) C cr{X ,. . . ,X k -i,Y ,. . . ,Y k -\}, the expression for n follows 
immediately. The expression for the initial measure M follows along similar lines. 

Ergodic property: We begin by proving existence of the invariant measure. Consider a copy 
(X k ,Y k ) k > of the hidden Markov model started at the stationary distribution Xq ~ A. Using 
stationarity, the process can be extended to negative times (X k ,Y k ) k& % also. Now consider the 
measure-valued process (X k ,P(X k G -\Y k , Y k -i, . . .)) (the regular conditional probability always 
exists in a Polish state space). It is easily seen that this is a stationary Markov process with 
transition kernel n. Thus the law of (Xq, P(Xq G -\Yq, Y_i, • • •)) is an invariant measure for n. 

It remains to establish uniqueness of the invariant measure. Endow the Polish space E x 7(E) 
with the Polish metric D((x,v), (x',v')) = d(x,x') + \\u — y'llBL) where d is a Polish metric on E. 
In lemma [B. II below, it is shown that assumption [2] implies that 



^0 



F(z, a) l~F (x, v, dz, da) — J F(z, a) IT (x, u', dz, da) 

whenever F is L>-Lipschitz. Let A and A' be two ll-invariant measures. Then the marginals of A 
and A' on the signal state space are invariant measures for P. But assumption [2] implies that A is 
the unique invariant measure for the signal, so we must have A(^4 x 7(E)) = A! (A x 7(E)) = \(A). 
By the Polish assumption, we therefore have the disintegrations 

A(A x B) = [ L A (x) L B (v) A x (dv) X(dx), A'(A x B) = [ L A (x) I B (v) A' x (dv) \(dx). 
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It follows that 



FdA 



FdA' 



Y\ j F(x,v) A(dx,dv) - / U j F(x, v) A'(dx, dv) 



< / \WF(x,v) - U j F(x,u')\ A x (dv)A' x (du')\(dx) 



J^OO 







whenever F is uniformly bounded and D-Lipschitz. But this class of functions is measure deter- 
mining, so A and A' must coincide. The proof is complete. □ 

Lemma B.l. Let D((x, v), (x' , u 1 )) = d(x, x') + \\v — v'\\bl, where d is a Polish metric on E. Then 



F(z, a) W (x, v, dz, da) — j F(z, a) IT (x, u' ', dz, da) 
whenever F is D-Lipschitz, provided assumptions^ and\Mhold. 

Proof. Consider a copy (XkiYk)k>o °f the hidden Markov model started at the initial measure 
Xq ~ P(x, ■), and define recursively 7Tfc = U(Yfc, -Kk-iP) and it' k = U(Y/t, Tr' k _ 1 P), k > 1 with 
ttq = \J(Yq,vP) and tt'q = \J(Yq,v' P). Then for j > 1, the measure IT (x, v, dz, da) coincides with 
the law of (Xj—i, Ttj—i), and similarly r\ J (x, v' , dz, da) coincides with the law of (Xj-.±, ttJ-_i). Thus 



F(z, a) IT (x, v, dz, da) — / F(z,a)V\ 3 (x,u' ,dz,da) 



|E(F(i H ,Vi)--P(4^i-i))l 



< l|F||rE(||7T 



j-1 ~ TTj-l\\BL, 



j-l - TTj-lllTVj 



But assumptions Q]and El allow us to apply the filter stability result [29} corollary 5.5], which implies 
that E(||7Tj_i — tKjJItv) as j — ► oo. This completes the proof. □ 

B.2. Proof of Theorem 12.21 The proof of theorem l2"T2l proceeds in several steps. Throughout this 
section ( appendix we always presume that the assumptions of theorem \2.2\ are in force. 
We begin by proving that the convergence holds on every finite time horizon. 



Lemma B.2. E({vrf (/) - vr fc (/)} 



for any k < oo and bounded continuous f : E 



Proof. As S is independent of (X k , Yk)k>o, we can write iTk(f) = E(/(Xfc)|3"^ V 9)- Therefore 
Ed*? (/) - ^(/)i 2 ) = Efa? {ff - 2 f(X k ) Trf (/)) + E(vr fc (/) 2 ), 

where we have used assumption [3] and the tower property of the conditional expectation. Define 
the bounded continuous function F : E x 7(E) — > M. as F(x,u) = v(f) 2 — 2 f(x) u(f). By lemma 
lA.ll and the first condition of theorem 12.21 we have M^\V^F — > MV\ k F as N — > oo. Therefore 



E(7rf(/) 2 -2/(X fc )vrf(/)) 



E(vr fc (/) 2 - 2 f(X k ) 7r fc (/)) = -E(n k (ff 



Substituting in the above expression completes the proof. 
We now strengthen this lemma to prove || • ||BL-convergence. 



□ 



Lemma B.3. E(||7r-^ — tt^ ||bl) jV ~ >oc ) for any k < oo. 

Remark B.4. The quantity E(||7r^ — tt^ ||bl) is well defined, as \\ir k — 7Tfe ||bl is measurable by [30] 
corollary A. 2]. We will therefore employ such expressions in the following without further comment. 

Proof. Fix e > 0. As X k takes values in the Polish space E, there exists a compact subset K C E 
such that ~P(Xk G K) > 1 — e. Moreover, by the Arzela-Ascoli theorem, there is an m < oo and 
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fi, ■ ■ ■ , fm 6 Lip(.E) such that min^sup^g^- \fix) — fg(x)\ < e whenever / £ Lip(i?). Define the 
open set K = {x £ E : d(x,K) < e}. Then min^ sup^g^- \ f(x) — ft{x)\ < 3e for any / £ Lip(E'), so 

E(|Kf -7r fc ||BL) <E( sup |7rf(//^)-vr fe (//^)| ] +E( sup |vrf (/I K c) - 7r fc (//^)| ] 
\feu P (E) J \feu P {E) J 

< 6e + E(max \i${ftI K ) ~ ^k(feI K )\) + E(^(A' C )) + P(X fc £ K c ) 

< 6e + E(max|vrf (/,) - vr fc (/,)|) + 2E(*f (iT c )) + 2P(A fe £ K c ) 



< 6e + ]T VE(«(/*) " ^(A)} 2 ) + 2E(vri v (ir c )) + 2P(A fe £ 



fci 

As -fC c is closed and M^n^ MY\ k by lemma |A. II and the first condition of theorem 12.21 applying 
the Portmanteau theorem to the second term and lemma IB. 21 to the first term gives 

limsupE(||7rf - 7r fc || BL ) < 6e + 4P(A fc £ K c ) < 6e + 4P(X fe £ K c ) < 10e. 

But e > was arbitrary, so the proof is complete. □ 

We have now established convergence of the filters as N — > oo for a fixed time k. The idea is 
now to repeat the proofs for the case where we let the number of particles and time go to infinity 
simultaneously We will repeat almost identically the steps used in the last two lemmas, where 
the finite time weak convergence M^^~\% MV\ k used in the proofs is replaced by the following 
ergodic lemma (recall that A is the unique invariant measure of 11). 

Lemma B.5. For any sequence T\r X 00 as N ~> °°> define the probability measures 

T N 



/i 
F(x,u)A N (dx,du):=B _^F(A fc ,^) 
. N k=i 



for every N £ N. Then Ajy=^A as N — > oo. 

Proof. We first show that the family {A^ : N £ N} is tight. It suffices to show that the marginals 
are tight by lemma IA.31 But the first marg inal of An is T^ 1 Y,l=i M pfc > which 

converges to the 

signal invariant measure A by assumption [2j This establishes tightness of the first marginal. By 
lemma IA.2} tightness of the second marginal follows from the second condition of theorem 12. 2[ 

Having established tightness, it remains to show that every convergent subsequence of {A^r : 
N £ N} converges to A. In fact, it suffices to show that the limit of every convergent subsequence 
must be an invariant measure of n, as the latter is unique by proposition 12. 1[ 

Let Aq(tv) be a weakly convergent subsequence of {Ajy : N £ N} and denote its limit as A. By 
the first condition of theorem 12.21 and lemma [A. 11 we have Aq^-jUq^ =^> All. But note that 

1 TQ(JV) 1 T + 

A Q(N)^Q(N) = JT— Yl M QW n Q[N) = A Q(N) + f i M Q(N) U Q(N) ~ M Q(N) ^Q(N)}- 

We therefore have 

~ 2 
||A - Al~l ||bl = hm || A Q(iV) - A Q r N) n QW \\ B L < limsup = 0, 

so A is an invariant measure for □ 



We now repeat the arguments of lemmas lB.21 and IB. 31 with the necessary modifications. 
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Lemma B.6. For any sequence T/v oo as N — > oo, 



E 



^-E^f(/)-^(/)} 5 
TN ti 



N^oo 



for any bounded continuous function f : E — > R. 
Proof. As in the proof of lemma IB.21 we can write 



E 



A.'=l 



E 



k=l 



+ E 



k=l 



where F(x, v) = v(f) 2 — 2 f(x) v{f). By lemma IB. 51 



E 



N— >oo 



E(E(/(A > )|lo,^i,...) 2 ), 



where we have used the expression for A in terms of the stationary copy (X k ,Y k ) ke x given in the 
proof of proposition 12.11 The proof would evidently be complete if we can show that 

hmsupE(vr fc (/) 2 ) < E(E(/(X )|Y , ^_ l5 . . .) 2 ). 

fc— >oo 

To this end, we proceed as follows. First, note that 

E(7r fe+ ^(/) 2 ) = V(E(f(X k+i )\Y , Y k+e ) 2 ) < E(E(f(X k+e )\X , . . . , X t , Y , . . . , Y k+e ) 2 ), 

where we have used the tower property of the conditional expectation and Jensen's inequality. But 
by the Markov property of (X k , Y k ) k >Q, we can write 

E(/pf fc+ £)|X , . . . , Xf, Y , . . . , Y k+ i) = ~E(f(X k+ i)\Xi, Ye, ... , Y k+ i) := G k (Xg, Yg, . . . , Y k+e ), 

where the function G k does not depend on £. Using assumption [21 it follows easily that 

hmsupE(7r,(/) 2 ) =limsu P EK + ,(/) 2 ) < E(G k (X^ k ,Y_ k , . . . ,Y ) 2 ). 

l^oo I— >oo 

But G k (X_ k , y_ fc , ...,Y )= E(/(X )|io, ■ ■ ■ , Y_ k , A_ fc ), so by the Markov property of (X k , Y k ) k > 
hmsupE(^(/) 2 ) < E(G fc (A_ fc , Y_ k , . . . , Y ) 2 ) = E(E(/(X )|a{^ : £ < 0} V a{X e : £ < -k}) 2 ) 

for all k. Letting k — > oo in this expression and using that 

P| <t{Yi ■ £ < 0} V a{X t :£<-k} = a{Y e : £ < 0} P-a.s. 

fc>0 

by [291 theorem 4.2] (which holds by virtue of assumptions Q] and [2]) , the proof is complete. □ 
Lemma B.7. For any sequence T/v /" oo as N — > oo, 



E 



BL 



k=l 



N^oo 



0. 



Proof. Fix e > 0, and choose a compact subset K C E such that \{K) > 1 — e. Construct 
fit ■ ■ ■ i/m ^ Lip(S) and K as in the proof of lemma lB.31 Then we can estimate 



E 



-. -t AT 

— y 



TTfc - 7Tfc||BL 



k=l 



< 



6e + E E 



i _ 



k=l 



+ 2E 



1/2 



k=l 



k=l 
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Applying lemma IB. 61 to the first term, lemma IB.5I and the Portmanteau theorem to the second 
term, and assumption [2] to the third term, we find that 



lim sup E 

N^oo 



TT/cllBL 



k=l 



< 6e + 4 X(K C ) < 6e + 4 \{K C ) < We. 



But e > was arbitrary, so the proof is complete. 

We can now complete the proof of theorem 12. 21 
Proof of Theorem \2.2l Suppose that 



□ 



lim sup sup E 

TV^oo T>0 



1 T 

k=l 



^k 



TTfc BL 



£ > 0. 



Then we can find subsequences Q(N) S oo and Tq( N ) such that 



E 



1 



J Q(JV) 

E IK 



Q(N) 
k 



TTjfc BL 



> - for all TV. 
2 



1 k=i 

Suppose first that Tqrm < r max is a bounded sequence. Then lemma IB731 gives 



E 



1 



T. 



Q(N) 



1 Q(N) 

Eli- 

k=l 



Q(N) 
k 



ffcllBL 



< max E( || 7r 

^ max 



^IIbl) 



TV— >oo 



so we have a contradiction. But if Tq^n) is an unbounded sequence, we can find a further subse- 
quence R(N) y oo such that T R ( N ) y oo, and by lemma lB?7l 



E 



1 



L R{N) 

E 



7Tfc||BL 



I fc=l 

which is again a contradiction. The proof is complete. 
B.3. Proof of Corollary 12.31 Note that we can estimate 

T 

fc=i s 

T 



N—too 



□ 



E 



±£ (f(X k ) - [ fdnA - I ~ [fd*. 

k=i ^ ' k=i ^ 

1 T 



fc=i 



E 



^EK Ar (/)-vr fc (/)} 5 



fc=i 



1/2 



The result is now easily obtained by following the same steps as in the proof of theorem 12.21 □ 
B.4. Proof of Proposition 13.21 We begin by proving a general continuity result for Rat. 
Lemma B.8. Rtv(i/jv, •) 5 u as N — > oo whenever z/jv => v as N — > oo. 

Proof. It follows immediately from the definition that the barycenter of Rjv(p, •) is p for any p £ 
y(E). Therefore, by lemma DO} the sequence {Rjv(^iv> - ) : N G N} is tight. It thus suffices to 
prove that every convergent subsequence converges to <5„. Let Q(N) be any subsequence such that 
Rq(n){ u Q(n)> ') R f° r some R G 7(T(E)). Note that for any probability measure p 



\p'(f)- P (f)\R N (p,dp')< 



{p(f)-p(f)} 2 RN(p,d P >) 



p(P)-p(fY 

N 



< 



2 

oo 



N 
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In particular, this shows that 

\u'(f) - u(f)\ R(du') = lirn^J \u'(f) - u(f)\ R QiN) (u Q(N) ,du') 

< J WU) ~ "Q(N)(f)\ RQ(N)(vQ(N),dv') + lm^\v Q(N) (f) - u(f)\ = 

for any bounded continuous function /:£?—>• R. Thus we must have R = 5 U . □ 

We can now complete the proof. 
Proof of Proposition HOI As Y(-,y) is bounded and continuous (assumption [3]) , we have 

f(x)T(x,y)v n (dx) -> / f(x)T(x,y)u(dx) 



for every y whenever / : E — > R is bounded and continuous and v n =>- v. This implies that 
U(y,z/„) \J(y,v) for every y, so in particular (x',u') i— ► F(x', U(y, z/)) T(x', y) is bounded and 
continuous for every y whenever F : E x 7(E) — > R is a bounded continuous function. Using the 
Feller property of P and lemma IB. 81 it follows that whenever xat — > x and ^ 

/" F(x',U(yy))r(x',y) Riv^P,^)^,^')^ / U(y, iaP)) T(x', y) P(x, dx') 

for every y and bounded continuous function F. But then we obtain by dominated convergence 

F(x',i/)n N (x N ,v N ,dx',dv') = f F(x',\J(y,u'))T(x',y)R N (u N P,du') P(x N ,dx') (p(dy) 



F(x', U(y, vP)) T(x, y) P(x, dx') y(dy) = \ F(x' , v) n(x, u, dx' , dv'). 



It remains to show that M. This follows immediately, however, from lemma lB.81 the fact 

that 7r i— ► U(y, 7r) is continuous, and dominated convergence. The finite time convergence now 
follows from lemma IB. 3 1 (which does not rely on assumption [2]) , and the proof is complete. □ 

B.5. Proof of Theorem [4731 As both assumptions require geometric ergodicity, we fix throughout 
the corresponding function V (which, by definition, is presumed to have compact level sets). To 
complete the proof, it only remains to prove the tightness assumption of corollary 13.31 We will in 
fact verify the simpler sufficient condition in lemma I27H through the following elementary result. 

Lemma B.9. Suppose that sup fc N E7r^ (V) < oo. Then the tightness assumption holds. 

Proof. The level sets C r = {x £ E : V(x) < r} are compact. But as 

supE^ (C r .) = supE"^ (V > r) < ■ > 0, 

k,N k,N r 

evidently the family {E-7r^ : k, N > 1} is tight, and we may invoke lemma 1X71 □ 
In the following, it is convenient to introduce the measure-valued process 

N(A , _ JlA(x)T(x,Y k )^^(dx) 

so that tt^ = U(lfc,7r^_). Note that tt^_ is the bootstrap particle filter approximation to the one 
step predictor 7Tfc_ (in fact, our main results are easily adapted to establish uniform time average 
convergence of irj?_ to 7Tfc_). The following result is the key tool that allows us to establish tightness. 
The condition of this lemma — essentially, the requirement that the update step tt^_ \— > does 
not 'expand' too much — will be verified separately under the assumptions [5] and El 
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Lemma B.10. Suppose the signal is geometrically ergodic and fJ>(V) < oo. If there exist constants 
ci, c 2 > such that Ett£(V) < ci Ett£_(V) + c 2 /or a// fc, then sup k N Ett£(V) < oo. 

Proof. Note that Evr fc _(/) = fi(P k f) = Evr^(P fe /) for all |/| < V. Therefore 

Et£(/) " E7r fe _(/) = ^{E7rf_(P fc -V) - E^^P^ 1 /)}- 

£=i 

But note that E7rj_(/) = E7r|^ 1 (P/), as we may average over the last sampling step. Therefore 

E*£_(/) - Evr fc _(/) = ^Evr^P^+V) " Evr^jP^ 1 /)}. 

As the signal is assumed geometrically ergodic, we have A(V) < oo and 

\\P k {x, ■ ) - A|| v < c 3 ^ fc for all x G P, fc > 0, 
for some constants C3 < 00, /5 < 1. In particular, we find that for any measures u\, v 2 

\\v x P k - u 2 P% = sup \{u! - v 2 }(P k f - A(/))| < c 3 (3 k \v x - u 2 \{V) = c 3 f3 k - u 2 \\ v . 

\f\<y 

Therefore we can estimate 

k k 
||E7T fc _ -E7T fe -||v < } JIE^^P -E7T (£ _ 1) _P T ||y<2^ c 3P ll E ^-l ~ E ^_l)- llv- 

£=1 1=1 

In particular, we find that 

k 

Ett^F) < Evr fc _(F) + ||Evrf_ - Evr fc _|| y < ^P k {V) + ^c 3 ^+ 1 {Evrf^F) + Ett^JF)}. 

i=\ 

By the assumption of the lemma we now obtain 

E^(V) < ^P k {V) + £ c 3 /? fc ^ +1 {( Cl + 1) E7rJ_ 1} _(y) + c 2 }. 
But /iP fc (F) -» A(V) as k — > OO, SO C4 = sup fc 

^pfc(y) + C2 c 3 /V(l - /3) < 00. By lemma EH] below 

Et£(V) < c 4 exp (j> + l)c 3 /3 fe ^ < c 4 exp +^ 3 ) . 

But as E7r^ (V) < ci E7r^(y) + c 2 , the proof is evidently complete. □ 

In the previous proof, we needed the following. 
Lemma B.ll (Discrete Gronwall). Suppose (A,otk, P&), k > are nonnegative scalars such that 

k 

«fe < ^4 + P^ Qi£_i for all k > 0. 
£=1 

T/ien it mtist 6e the case that 



< A exp P^j /or all k > 0. 
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Proof. As log(l + x) < x, it suffices to prove the first inequality in 

k I k \ I k \ 

a t <4j](l|B,) = iexp ^log(l|B ( ) < A exp [J2 B A ' 
1=1 \£=i / \e=i / 

We proceed by induction. Clearly the statement is true for k = 0. Now suppose we have verified 
the statement for all £ < k. Then by assumption 

k £-1 k r i-x e-i } 

a k <A + AY J B i \{{l + B T ) = A + Aj^ \ (1 + B e ) J](l + B r ) - + B r ) \ . 

£=1 r=l £=1 I r=l r=l J 

But the rightmost expression is evidently a telescoping sum which reduces to 

{k \ k 

JJ(l + B r )_lL =AjJ(l + S r ). 
r=l J r=l 

The proof is complete. □ 

It remains to show that EttF (V) < ciE7rj^(V) + C2. Here we distinguish between the two 
separate cases of assumptions \5\ and The results below complete the proof of theorem 14.31 

B.5.1. Case I. In the setting of assumption [51 the result is straightforward. 

Lemma B.12. Suppose that assumptions U\ and\J& hold. Then Eir^ (V) < c\ Eitt^_(V) + C2- 

Proof. Note that 

Ar/m _ JV(x)T(x,Y k )^_(dx) u+ {Y k ) , , . N 



J Y(x,Y k )ir£_(dx) u-{ y k) 



We may therefore estimate 

E(^(V)\Y ,...,Y k . 1 )<n^(V)E 



u+(Yi 



kj 



U-(Y k ) 



Y ,...,Y k ^ 



= ^-(V) j ^r(x,y)v(dy)n k „(dx) <nt(V) j ^jfr 

Taking the expectation of both sides completes the proof. □ 

B.5.2. Case II. In the setting of assumption [6] we will need the following result, whose proof we 
recall for the reader's convenience, to control the growth of the update step. 

Lemma B.13 (Chebyshev's covariance inequality). Let tp,(ft : R — ► R be nondecreasing functions 
and let v be any probability measure on (R, 23(R)). Then 



ip(x) 4>{x) v{dx) — J ift(x)v(dx) J <ft{x) v{dx) > 0, 

i.e., the covariance of tp and (ft is always nonnegative. 
Proof. Note that 

ip(x) <j)(x) v(dx) — j ip(x) v{dx) j <p(x) v{dx) = — j {ip(x) — ift(x')} {4>{x) — <ft(x')} v(dx) v(dx'). 

But by our assumptions the integrand is nonnegative, and the result follows. □ 

We obtain the following result. 
Lemma B.14. Suppose that assumption^ holds and n(V) < oo. Then Evrf (F) < c x Eir k y _{V)+c 2 . 
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Proof. We choose 93 to be the Lebesgue measure and T(x,y) = q^(a(x) 1 {y — h(x)}). Note that 

a x q{\a{xY l {y - h(x)}\) < T(x, y) < a 2 q^a^iy - h{x)}\). 

As all finite dimensional norms are equivalent, we have K -1 ||t>|| < \v\ < n\\v\\ for all v and some 
k > 0. Using (a + b) p < C p (a p + If) where C p = max(l, 2 P_1 ), we can therefore estimate 

\\h(x)\\ p < {\\Y k - h(x)\\ + \\Y k \\} p 

<{ £ ->(*rHn-M*)}ii + iiw 

< C p e-vtf>\a(x)- l {Y k - h(x)}\ p + C p \\Y k f. 
In particular, using that V{x) < &3||/i(x)|| p + 64, we find 

Nn/ s . C p K p a 2 b 3 f \o-(x)~i{Y k - h{x)}\ p gdajxrHYk - h(x)}\) ir£_(dx) 

* k{V) --0*i f q (\*{x)-HY k -h(x)}\)*»_{dx) + ^3||n|| +64. 

But q is nonincreasing and v 1— > v v is nondecreasing, so by lemma IB. 131 

e p a x 

Now note that 

\a(x)-\Y k - h(x)}f < C p e^^\\Y k f + C p e~ p K p \\h(x)\\ p 

< C p e~ p K p \\Y k \\ p + C p e~ p K p {V{x) - b 2 }/b 1 . 
Substituting in the above expression, we obtain 



vrf (V) < CpK [ a ^ I \a(x)-\Y k -h{x)}\ p nt(dx) + C p b- i \\Y k \\ p + b i 



AT C 2 K 2p a 2 b 3 f AI 

«k(V) Z ! 2P t / V{x)^_(dx)+C p b, 



1 + 



C p K 2p a 2 
e 2p ai 



\Y k \\ p + b 4 



C 2 K 2p a 2 b 2 b, 



3 



e^axbi 
Finally, note that 

< C p {E(V(X k ))/h - 63/61 +e- p E(U k \\ p )} 
which is bounded uniformly in k by our assumptions. Therefore 

C„K 2p a 2 



e 2p cnbi 



AT C 2 K 2P Ob 2 bf. A r 



1 + 



e 2p a! 



supE(|F fe n + 6 4 

fc>0 



C 2 K 2p a 2 b 2 b 3 



which completes the proof. □ 
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